NoteTakcr Smalltalk Conventions 

Tills is a working document. Send protests to: Ted Kaehler 

Filed on: [IVY] < kaehler > NoleTaker.Conventions 

lliis is version 5 on March 6, 1979 

(Changes are underlined ) 

Object Pointer Space 

A pointer to an object is called an OOP (ordinary object pointer). The 16 bits of an oop are 
apportioned as follows: 

oop = xxxx xxxx xxxx xxOO The oop is a pointer to an object. The oop is the actual 
displacement in the OT of the entry of this object. The first oop is and the last actual oop 
may be as big as OEFFC hex. 

oop = xxxx xxxx xxxx xxxl Tills is the oop of an integer. The value of the integer is 
oop/2. 

oop = xxxx xxxx xxxx XX 10 (Not yet specified) 

Physical Memory 

The bottom few K of physical memory are local to the processor. ITie entire space is divided as 
follows (all addresses are byte addresses in hex): 

00000 (CS points here) 

Local memory for interpreter, storage management, 

interrupt vector. 
02000 (Local memory ends at 02000 assuming 8K bytes of it). 

OT is up to 15K entries of 4 bytes apiece. 
10000 

Data area, (actual Smalltalk objects reside here). 
3FFFF (end of physical memory 256 K bytes). 
FFFFO 

Bootstrap locations for 8086 
FFFFF (end of virtual memory IM bytes). 

Object Table (OT) 

The object table contains a two word entry for each object and is indexed by the oop (plus 
OTbase) off the code segment register. The bit fonnat follows (Note that I am showing the bits as 
high:low but in memory the bytes are reversed): 

rrrr rrrr uuut tttv (bytes 1.0) 

ssss ssss ssss ssss (bytes 3,2) 

where r is REF the reference count, u are unused bits (possibly for garbage collector status), t is 
TOFF the core address offset, v is CLL (core longer than lengtli, for inflated allocation), and s is 
TSEG the core address segment origin. We will use the LDS instruction to load the first OT word 
into register SI and the second word into segment register DS. Subsequent MOV instructions used 
to access the data fields of Oic object compute tlic effective address as DS*16 +SI. Notice that the 
v bit is anded out to give a zero bit Uicrc. Thus the TOFF" field overlaps the TSEG field by one 
bit. TOFF and TSEG are normalized so that negative offsets of up to 8 bytes are allowed on 
SI,DS. After anding out tlie REF, u, and v fields of SI, the core address of tlie field ends up being 
ssss ssss ssss ssss 0000 + t ttlO + offset. 



Normalization applies to both objects and free blocks, thus the noiTnalized core address in an OT 
entry will survive many allocations and frees. A nonrialization is only done during an allocation 
which must carve the core off a big block (this is also when a new OT entry filled in). Objects are 
renormalized during compaction. 

Tlie OT must lie below the data space. OT entries may point below the data space at the OT (for 
empty entries the freelist). Hiey point into the data space (for objects and free bl(x:ks). ITicy may 
not point above the data space. Hie end of the O'V is marked by a fake entry beyond the end. It 
has REF=:0. A fake piece of core just beyond the end of the data space contains its oop 
(ILLOOP). (llie compactcr uses this to know when to stop). 

Segment Registers 

CS always points to base of current 8086 code (00000). CS is used to address the OT just above 

the code. 

SS Points to the current context during interpretation. It is switched to the top of the stack space 

(which grows down) while large chunks of 8086 code run. 

DS changes continually and points to the segment of the current object (instance, message diet, 

class, literal, or object reference). 

ES points to segment of current method (the byte codes). 

P.emem.ber to say 'seg cs' before accessing any variables in the code segm.ent! 

Mapping 

Given an OOP in BX, the code to find the object in the OT and load field 3 looks like this: 
SEG CS ;using code segment 

LDS SlOn)ase!BX ;load DS,SI with oop in BX off CS + OTbasQ 

AND SI,#TOFFMSK ;AND out non-offset bits (TOFFMSK = OlEH) 

MOV AX,3!SI ;load AX with field 3 of object (using DS) 

Format of an Object 

The 0th word of an object is pointed at by the OT. It contains the high 10 bits of the oop of the 
class of tlic object, (llie low 6 bits of Uie oop of a class must be zero), llic low 6 bits of the 
zerotli word arc LC, the length code, llie fields follow in the next words. Objects must begin on 
even byte addresses. 

Lengths of Objects 

It is always possible to tell if an object is in use or on a freelist by looking at the REF field of the 
OT. A zero in the REF field means that the object is on a freelist. If so, its total length in bytes 
is found in its zerotli core word. (Tliere is also a freelist of empty OT entries witli no core. They 
each point at the next free OT entry and not at core and tlius have lengtli zero). Else REF is not 
zero and the object is in use. If the LC field in the zcroth core word contains a zero, the object is 
an octave object and the total length in bytes minus 2 is found in the word before the zeroth word. 
(ITie OT still points at the word containing the class). If LC is not zero, it contains the total 
number of bytes in the object (up to 63). llie maximum size of an object is 64K-1 bytes. The 
minimum size is 4 bytes (zero field objects are octave and thus have a 4 byte header). Note that 
the length of any object is available without reference to any otlier object. 

Classes 



Of the 15K possible objects, IK of them may be classes. The oop of a class has in the low 6 
bits. A special allocation call dispenses these valuable oops from the end of Uie OT empty entry 
freelist where they were stashed in primordial times. If a non-class would suddenly like to become 
a class, it must be renamed. Object renaming will be treated as a special primitive (like 
alllnstances) and will involve a scan of all fields in the system {~1 sec). Allocation of oops will be 
from the bottom up with NIL being 0, false being 1, true being 2. All newly allocated object will 
have in every field. ITius all pointer fields are initialized to NIL and all non-pointer fields are 
initialized to bits. 



New Objects 

A field called Instspec in each class carries information about the instances of that class. All 
instances are allowed to have two kinds of fields. Named fields are fields that every instance of 
this class has (they known by the compiler and come first in tlie instance), llie LONG field in 
Instspec encodes how many named fields instances of this class have. Extra fields are held only by 
this instance (they are at the end of the instance). Tlie object creation routine adds the number of 
named fields to the number of extra fields to compute the length of the new instance. Tlie call to 
the creation routine supplies the number of extra fields. (Many objects will have extra fields. 
Strings and Vectors have no named fields). 
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associated with an object in use. Beware that it is in limbo between being created and being rcfi'd. 
No compacfion is allowed while an object has REF = and is not on a freelist. 

Storage Allocation 

Every piece of user core is pointed at by the OT at all times. All free core and empty OT entries 
hang around on freelists. Lcngtlis in words of (empty entry),2,3,4,5,6,7,8,9, 10,11, 12, 13, 14,15, 16, and 
BIG get separate free lists. A freelist head contains an oop, which points at core. The first core 
location contains the byte length and the second the oop of the next entry on the list. (P'or the 
(empty entry) list, TOFF has OFH, an illegal value, and TSEG contains the oop of tlie next entry) . 
Free object must have REF = and CLL = 0. (I'he deallocation routine makes sure this is true). 
Ilie (empty entry) list must always have something on it. Other lists may be empty. No merging 
of adjacent free blocks is done. Instead compaction unifies core when fragmentation sets in. 

Objects may carry more core than they use. If the core longer than length bit (CLL) is on in the 
OT, tlicn the true core length is found above the top of the object (above HEADER or 
OLENGTH). The idea behind this length inflation is so that 'one size fits all' for small objects 
(avoids needlessly chopping up big blocks when a slightly too big small block is around) and so big 
strings may grow without many allocations. ('ITie first system will make minimal use of CLL). The 
compactor shrinks all inflated objects. 

Compaction 

A compaction occurs between the low water mark (LWM) and the end of the data space. The low 
water mark is found by searching the lists of free blocks for tlie lowest one. For objects in the OT 
with core above LWM, we stuff Uie contents of their 0th or -1th core word into TSEG. In the core 
word we put their oop. In the low bit of the TOFF field (TOL) we put a 1 if tliey had a -1th 
word and a if tliey didn't (tliis infomiation would be destroyed otherwise). During the core 
sweep, free blocks are distinguished from objects by a zero in the REF field. The length of any 
object or block may still be computed. After an object is moved, tlie new TSEG and TOFF are 
computed by normalizing the current core location. No compactions are allowed while a recursive 
free is in progress, since some state resides in blocks tliat are marked free. 



Locking of Objects 

Each piece of machine code that wants to lock an object must reserve within itself a locking block. 
The block contains places for an oop, an offset, and a segment. At assembly time the routine 
informs the core compactor of the location of its locking block. To lock an object, the routine finds 
out the current core address of the object and enters both it and the oop in the block. If any 
object creation or storage allocation occurs between two successive uses of the core address, that 
address must be reloaded from the block into the machine registers. (As you might guess, 
whenever a compaction occurs, a cleanup routine goes through all die locking bl(x:ks and maps the 
oops to get the new core addresses). Objects are always free to move during a compaction. Since 
objects are not actually locked, they- do not need to be unlocked. Format of a locking block: 

oop 

offset (low bits of core address) 

segment (high 16 bits) 

Reading the Data Structure Address Space Diagrams 

Enclosed are diagrams that describe all the data stmctures and data paths in this vmem. To help 
you understand them, I will walk through the first page. Find Fig lb, OT Format and mapping to 
fields. The format of an OT entry is in the upper left-hand corner. ThQ two OT words appear 
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TOFF through a memory reference to the core of this object, llic index of a field within the 
object is added to TOFF and TSEG in the proper allignment. ITie result is TADR, a 20 bit 
quantity. Tlie large square in the center converts the 20 address entering at the top to a selection 
of one out of 2t20 possibilifies. Since the bottom address bit is and since only some of the 
values are never used (objects don't start every 2 bytes), the words 'some 2' appear in the square. 
Since we only have 256K out of IM, three quarters of die values are unused. Hie values that are 
used are expanded to the left. The large open arrow with the word 'byte' in it means that tlie 
values are used as byte addresses to a memory. 'Hie words 'memstart' name the starting location of 
this table in the memory. The braces on the left name two parts of the table, hidividuai examples 
of places referenced in the mem.ory are carried down to the lower part of the page, lliree types of 
objects are shown witli various fields labeled. 

In general, boxes are data structure fonriats except squares with diagonal lines which are 'bits to 
choice' conversions. Braces and curved lines are correspondences between the same field in one 
place and anodier. Words in boxes are field names. Numbers beside curved lines are numbers of 
bits. Numbers beside vertical lines are maximums and minumums of value ranges. Numbers in 
boxes are actual bit values. Text outside boxes are comments. 



Hex Helper 



(hex decimal octal english) 
(01 1 01 one) 
(04 4 04 four) 
(010 16 020 sixteen) 
(040 64 0100 sixty-four) 
(0100 256 0400 two-fifty-six) 
(0400 1024 02000 IK) 
(01000 4096 010000 4K) 
(04000 2tl4 040000 16K) 
(010000 2tl6 0200000 64K) 
(040000 2tl8 01000000 256K) 
(0100000 2t20 04000000 IM) 



(02 2 02 two) 

(08 8 010 eight) 

(020 32 040 thirty-two) 

(080 128 0200 one-twenty-eight) 

(0200 512 01000 five-twelve) 

(0800 2048 04000 2K) 

(02000 8192 020000 8K) 

(08000 2tl5 0100000 32K) 

(020000 2tl7 0400000 128K) 

(080000 2tl9 02000000 512K) 



I will distribute this memo whenever tiiere are major additions. Ingalls Robson Fairbaim Kay 



Tesler Horn Kaehler Merry McCall Krasner Deutsch (11) 



RG IdL /^o\<?-vq'sc<5R ooP F0i^^/2T ' 4^0 OT n^rPI^& 



OOP 



oaj€or Oop 



iG 



1^3T<S4^A OdP 






.,\.-.— 



rL 



\ / I I f— ^ 

\ 1 1 \cr) 




j o8J60r 







._.,J,.- 






06T€-cr s^>^<^^ 







,C„iU, f ^ 



o 



TseG 



OT €rJT^^ 



^ 



/^ 



iNSTAf^ces OF Ct^SS 

r 



VALU6 OP 



IS 



x^^t _Jf£[; 



^<^^^ 



)Op 



ra 




"TTJe Coais- o^ TU15 
To. 66 PAlTe^cA _..._ 

(ACTUvAl. QoP Of _ (fS'^CgjL 






cll 



FISLD 










;& 



+ 



1 ! 'f If 



T5eG 



j& 



TA DR 



ao 




-..-....- ._,.._,,-„„,„. .,..._ ^^, -^ „ 

or GnyR^. TmoB Jeer f^As No f\€L5S AfJd )S o^J^OTfg^^^U^T, t5ee BG^ lej 



If F/eti)=$5 






/^C005S 




3^ 



D^LO 



fl&Lo A 



-5. 

2^^ 



OL6/^€,Ty 


Acuss tc 


Hi^i// 1 


___,. — 










FJi€^ oBr€<LT 




3^. 




FIG. i. 



tc/^ST^ R>/?/^ATS 



o ___,.._., 







AcUS3 






]><^ 



^.^'^'^^,...o#'3;'^.'*r? i_. 



7N 5^ 









^.^,..„ ___ _._._<?^._.__X^<0/lt-r_i,.- 



^ 




Acu^S ILC 



ti&LdS 



>— — 

U- ■ 







j jpLg[j<^«^jy rf u^^^ 



p' 




I IF CLL^l TUervi :_ 

TP'i^e gireS of caA^__: 

opJgot 



F/^n CT ^^tT/2.^ 



[CLL I 

! 1 .• 












n^ Id 



OoP OF CU35 AMD D.<3T^ /N A CUIS 



I fa ' ^ 



OOP 



Jlr^ 



ocP af COQS^ prjTM. 



Oi','^ 









"-■'f' 




FIG le OBjecxs or^ F/?-^^l(sys. 



<^ni>T^ ^/^T4i<?s (r4 Oli^qiJt Ta6l^. Co^ 



B^At) 






"Y^ff zzd^ , 






(.J5t. 6f\r^€X 




E^^ _Q5J^€crr 0^\Ti Cok/&)_ led<STH. ^l7j/^oRos, 



^''"It fFllsTi 



^M^^^} 




8l<SAiGTM 



U^K 



.-W ooP) 






(StSff^,.rfi^%'5t<5^ ■ 



I35"-^'^^ 



r 



N 




^^t^_ 



;Tef e= ^- 



-i^ l^-^fe 



.'- •> 



^ ~ v^ Igv *^-JW 



^^A^a^^-fjJ^^tifT^-'^ 



j^/^^0 



BtG/N^6TM 



^ 





€a3Z) - - 


^^ 


l3l€/^TI4 


/' 


__£__ 


/ 


■ - ; 



f^O FjZ^^ Suae r^^ /nW^ iSt^rJC,T44 ^ [^H k - ^^ iSlT^S 



Fd.^ /^^^ F/2-€€^ d^ar^^T, feF ^y , ctc^ ¥ 



/c or Pour AT p^m/l'V^ftO K oBj^<.r Cf\s Ff<^g^^ 

or R^Vs^al Du^i/^ fconm^^Tto/o: ' ■ : ' . . : ._^ • 




j^ H^^,i^^^n 






- FR^ee o8<TecT -. 



Fa(2. (hipc/iT^O .ALcoo3T?Q^ o.^Q^n-^j 



/ LINK') 



^ FlG-- 1q _ (?gfg/2-q ^J<JS ^u^^r 



JlGfSA(5r>^<.(B OoUy^y 



OVeR.FU3W 



Tm^<^_ : 




Ff^n oT ^Nr^'i 



^.T /f'.y 56;^ ._ .. 



loU^>^ AT C^^Ts £^X_0^ A«kW6€, iT fS ,2^S«5-r To 7-|]^^jL^._ 



Ba ^ il 



ToFr VALue^ 



Toff l< AjoM'^u^eO To /3ito^ OfFicTs 0(= ,± ? BlT^s 
_ CajIThout oy&^i^^ (5£ U/^iM4=^^aW» _ _ ^ 







fl& It 



08J€c-x FoRnAV OU^iti^ A P-^cuzswje fv^ec. 



.B^T ^^FD 



.,^, 



J^e^J<iTM 



f^Tl4gR "^ ,C5,«2?.J=,.^:^'^^__.^^.^A'0._^J.^S |^^/?:e._ ___, Xo f^^^ "^^^ 68Cr* 

^ .„_y ^^ ^^.-.M_^!f ^^„.:? __ ^^ ^ __, : 



LIV& 



SoSf^>^&eD 



UIW6 



Zff 



wUs^e 



NOB^L .2^^ 



-^ 


Si€^l€»TH 




rA\U^£ 


. 


>WUeR<^ 


— ,-, *-^ — — 


DeAB 

— J - ■ ■ - ■ ■■-■ ■- 

cm:' 'J 



•^ 



,^ 



t^ 



o* 



15^ 






fjol PiUoVi 






Fl6 I4 



€NcoiiWS Of Ari oof .1^' A?>J Hle6^A. 



oaa^cT ooP 



ASooP 




(^as^ ^ikjod^d Ir4 AM lriy^:G^R) 



